首页> 外文OA文献 >Mapping short DNA sequencing reads and calling variants using mapping quality scores
【2h】

Mapping short DNA sequencing reads and calling variants using mapping quality scores

机译:绘制短的DNA测序读数并使用绘制质量评分绘制调用变体

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.
机译:新的测序技术为DNA序列的使用开辟了一个新时代。但是,其中一些技术会产生非常短的读取,通常只有几十个碱基对,而要有效使用这些读取,则需要新的算法和软件。特别地,在有效地使短读与参考基因组比对以及在该比对中处理歧义性或缺乏准确性方面存在主要问题。在这里,我们介绍了映射质量的概念,这是对读取实际来自映射算法所对齐位置的置信度的一种度量。我们描述了可以通过将散弹枪短读序列映射到参考基因组,使用质量得分来推导二倍体基因组的共有序列的基因型调用(例如从人类样品中获得)来构建程序集的软件MAQ。 MAQ充分利用配对对信息,并估计每次读取比对的错误概率。使用合并映射质量,原始序列质量得分的错误概率,两种单倍型的抽样以及现场相关错误的经验模型的贝叶斯统计模型,还可以得出最终基因型调用的错误概率。读取映射和基因型调用均基于模拟数据和真实数据进行评估。 MAQ准确,高效,多功能且用户友好。可从http://maq.sourceforge.net免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号